Efficient feature size reduction via predictive forward selection

نویسندگان

  • Matthias Reif
  • Faisal Shafait
چکیده

Most of the widely used pattern classification algorithms, such as Support Vector Machines (SVM), are sensitive to the presence of irrelevant or redundant features in the training data. Automatic feature selection algorithms aim at selecting a subset of features present in a given dataset so that the achieved accuracy of the following classifier can be maximized. Feature selection algorithms are generally categorized into two broad categories: algorithms that do not take the following classifier into account (the filter approaches), and algorithms that evaluate the following classifier for each considered feature subset (the wrapper approaches). Filter approaches are typically faster, but wrapper approaches deliver a higher performance. In this paper, we present the algorithm – Predictive Forward Selection – based on the widely used wrapper approach forward selection. Using ideas from meta-learning, the number of required evaluations of the target classifier is reduced by using experience knowledge gained during past feature selection runs on other datasets. We have evaluated our approach on 59 real-world datasets with a focus on SVM as the target classifier. We present comparisons with state-of-the-art wrapper and filter approaches as well as one embedded method for SVM according to accuracy and run-time. The results show that the presented method reaches the accuracy of traditional wrapper approaches requiring significantly less evaluations of the target algorithm. Moreover, our method achieves statistically significant better results than the filter approaches as well as the embedded method. & 2013 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying Combined Approach of Sequential Floating Forward Selection and Support Vector Machine to Predict Financial Distress of Listed Companies in Tehran Stock Exchange Market

Objective: Nowadays, financial distress prediction is one of the most important research issues in the field of risk management that has always been interesting to banks, companies, corporations, managers and investors. The main objective of this study is to develop a high performance predictive model and to compare the results with other commonly used models in financial distress prediction M...

متن کامل

Stable Feature Selection: Theory and Algorithms

Feature selection plays an important role in knowledge discovery from many application domains with high-dimensional data. Many feature selection algorithms have been developed and shown successful at improving predictive accuracy of learning models while reducing feature space dimensionality and model complexity. Besides high accuracy, the stability of feature selection the insensitivity of th...

متن کامل

Stepwise regression for unsupervised learning

I consider unsupervised extensions of the fast stepwise linear regression algorithm [5]. These extensions allow one to efficiently identify highly-representative feature variable subsets within a given set of jointly distributed variables. This in turn allows for the efficient dimensional reduction of large data sets via the removal of redundant features. Fast search is effected here through th...

متن کامل

Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

BACKGROUND One of the main topics in the development of quantitative structure-property relationship (QSPR) predictive models is the identification of the subset of variables that represent the structure of a molecule and which are predictors for a given property. There are several automated feature selection methods, ranging from backward, forward or stepwise procedures, to further elaborated ...

متن کامل

Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps

Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 47  شماره 

صفحات  -

تاریخ انتشار 2014